Bayesian Graphical Entity Resolution using Exchangeable Random Partition Priors

نویسندگان

چکیده

Entity resolution (record linkage or deduplication) is the process of identifying and linking duplicate records in databases. In this paper, we propose a Bayesian graphical approach for entity that links to latent entities, where prior representation on structure exchangeable. First, adopt flexible tractable set priors structure, which corresponds special class random partition models. Second, more realistic distortion model categorical/discrete record attributes, corrects logical inconsistency with standard hit-miss model. Third, incorporate hyperpriors improve flexibility. Fourth, employ partially collapsed Gibbs sampler inferential speedups. Using selection private nonprivate data sets, investigate impact our modeling contributions compare two alternative addition, conduct simulation study household survey data, vary distortion, duplication rates size. We find performs consistently than alternatives across variety scenarios typically achieves highest accuracy (F1 score). Open source software available proposed methodology, provide discussion regarding work future directions.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Bayesian Procedures Using Random Series Priors

We consider a general class of prior distributions for nonparametric Bayesian estimation which uses finite random series with a random number of terms. A prior is constructed through distributions on the number of basis functions and the associated coefficients. We derive a general result on adaptive posterior contraction rates for all smoothness levels of the target function in the true model ...

متن کامل

Robust Bayesian analysis given priors on partition

In Bayesian analysis, suppose that probability measures can be speciied over the subsets partitioning the parameter space , so that they are to be joined in a unique prior measure, deened all over , according to some weights. Should the weights be uncertain , then the class ? of all the probability measures compatible with such uncertainty is to be speciied instead. Situations in which such a c...

متن کامل

Random Projections with Bayesian Priors

The technique of random projection is one of dimension reduction, where high dimensional vectors in RD are projected down to a smaller subspace in Rk. Certain forms of distances or distance kernels such as Euclidean distances, inner products [10], and lp distances [12] between high dimensional vectors are approximately preserved in this smaller dimensional subspace. Word vectors which are repre...

متن کامل

Entity Resolution with Empirically Motivated Priors

Databases often contain corrupted, degraded, and noisy data with duplicate entries across and within each database. Such problems arise in citations, medical databases, genetics, human rights databases, and a variety of other applied settings. The target of statistical inference can be viewed as an unsupervised problem of determining the edges of a bipartite graph that links the observed record...

متن کامل

Priors on exchangeable directed graphs

Directed graphs occur throughout statistical modeling of networks, and exchangeability is a natural assumption when the ordering of vertices does not matter. There is a deep structural theory for exchangeable undirected graphs, which extends to the directed case via measurable objects known as digraphons. Using digraphons, we first show how to construct models for exchangeable directed graphs, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of survey statistics and methodology

سال: 2023

ISSN: ['2325-0984', '2325-0992']

DOI: https://doi.org/10.1093/jssam/smac030